The Markov Decision Process Extraction Network
نویسندگان
چکیده
This paper presents the Markov decision process extraction network, which is a data-efficient, automatic state estimation approach for discrete-time reinforcement learning (RL) based on recurrent neural networks. The architecture is designed to model the minimal relevant dynamics of an environment, capable of condensing large sets of continuous observables to a compact state representation and excluding irrelevant information. To the best of our knowledge, it is the first approach published to automatically extract minimal relevant aspects of the dynamics from observations to model a Markov decision process, suitable for RL, without requiring special knowledge of the regarded environment. The capabilities of the neural state estimation approach are evaluated using the cart-pole problem and standard table-based policy iteration.
منابع مشابه
Recurrent Neural State Estimation in Domains with Long-Term Dependencies
This paper presents a state estimation approach for reinforcement learning (RL) of a partially observable Markov decision process. It is based on a special recurrent neural network architecture, the Markov decision process extraction network with shortcuts (MPEN-S). In contrast to previous work regarding this topic, we address the problem of long-term dependencies, which cause major problems in...
متن کاملOptimizing Red Blood Cells Consumption Using Markov Decision Process
In healthcare systems, one of the important actions is related to perishable products such as red blood cells (RBCs) units that its consumption management in different periods can contribute greatly to the optimality of the system. In this paper, main goal is to enhance the ability of medical community to organize the RBCs units’ consumption in way to deliver the unit order timely with a focus ...
متن کاملAutomated Tumor Segmentation Based on Hidden Markov Classifier using Singular Value Decomposition Feature Extraction in Brain MR images
ntroduction: Diagnosing brain tumor is not always easy for doctors, and existence of an assistant that facilitates the interpretation process is an asset in the clinic. Computer vision techniques are devised to aid the clinic in detecting tumors based on a database of tumor c...
متن کاملA new machine replacement policy based on number of defective items and Markov chains
A novel optimal single machine replacement policy using a single as well as a two-stage decision making process is proposed based on the quality of items produced. In a stage of this policy, if the number of defective items in a sample of produced items is more than an upper threshold, the machine is replaced. However, the machine is not replaced if the number of defective items is less than ...
متن کاملMarkov Chain Anticipation for the Online Traveling Salesman Problem by Simulated Annealing Algorithm
The arc costs are assumed to be online parameters of the network and decisions should be made while the costs of arcs are not known. The policies determine the permitted nodes and arcs to traverse and they are generally defined according to the departure nodes of the current policy nodes. In on-line created tours arc costs are not available for decision makers. The on-line traversed nodes are f...
متن کامل